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Abstract 

One-clock priced timed games is a class of two-player, zero-sum, continuous-time games that was 
denned and thoroughly studied in previous works. We show that one-clock priced timed games can be 
solved in time ml2 n n°' 1 ', where n is the number of states and m is the number of actions. The best 
previously known time bound for solving one-clock priced timed games was 2°'™ +m >, due to Rutkowski. 
For our improvement, we introduce and study a new algorithm for solving one-clock priced timed games, 
based on the sweep-line technique from computational geometry and the strategy iteration paradigm 
from the algorithmic theory of Markov decision processes. As a corollary, we also improve the analysis of 
previous algorithms due to Bouyer, Cassez, Fleury, and Larsen; and Alur, Bernadsky, and Madhusudan. 

1 Introduction 

Priced timed automata and priced timed games are classes of one-player and two-player zero-sum real-time 
games played on finite graphs that were denned and thoroughly studied in previous works [2j [4j [3l [16j El 
IH1 HI 121 H] • Synthesizing (near-) optimal strategies for priced timed games has many practical applications 
in embedded systems design; we refer to the cited papers for references. 

Informally (for formal definitions, see the sections below), a priced timed game is played by two players on 
a finite directed labeled multi-graph. The vertices of the graph are called states, with some states belonging 
to Player 1 (or the Minimizer) and the other states belonging to Player 2 (or the Maximizer). We shall 
denote by n the total number of states of the game under consideration and m the total number of arcs 
(actions). Player 1 is trying to play the game to termination as cheaply as possible, while Player 2 is trying 
to make Player 1 pay as dearly as possible for playing. At any point in time, some particular state is the 
current state. The player controlling the state decides when to leave the current state and which arc to 
follow when doing so. For each arc, there is an associated cost. Each state has an associated rate of expense 
per time unit associated with waiting in the state. The above setup is further refined by the introduction of 
a finite number of clocks that can informally be thought of as "stop watches" . In particular, some arcs may 
have associated a reset event for a clock. If the corresponding transition is taken, that clock is reset to 0. 
Also, an arc may have an associated clock and time interval. When the arm of the clock is in the interval, 
the corresponding transition can be taken; otherwise it can not. With three or more clocks, the problem of 
solving priced timed games is known not to be computable [6j. In this paper, we focus on the computable 
case of solving one-clock priced timed games. We shall refer to these as PTGs. We shall furthermore single 
out an important, particularly clean, special case of PTGs. We shall refer to this class as simple priced 
timed games, SPTGs. In an SPTG, time runs from to 1, the single clock is never reset, and there are no 
restrictions on when transitions may be taken. A slightly more general class of games was called "[0,l]-PTGs 
without resets" by Bouyer et al. (§]. 
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is supported in part by this Google Fellowship. 
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As is the case in general for two-player zero-sum games, informally, a priced timed game is said to have 
a value v if Player 1 and Player 2 are both able to guarantee (or approximate arbitrarily well) a total cost of 
v when the game is played. The guarantees are obtained when players commit to (near-)optimal strategies 
when playing the game. Player 1, who is trying to minimize cost, may (approximately) guarantee the value 
from above, while Player 2, who is trying to maximize cost, may (approximately) guarantee the value from 
below. Clearly, in general, the value of a one-clock priced timed game will be a function v(q, t) of the initial 
state q and the initial setting t of the single clock. Bouyer et al. [8] showed that the value v(q, t) exist^l and 
that for any state q, the value function t — > v(q, t) is a piecewise linear function of t. By solving a game, we 
mean computing an explicit description of all these functions (i.e., lists of their line segments). From such 
an object, near-optimal strategies can be synthesized. 

Figure [T] shows an SPTG with n = 5 states. Circles are controlled by Player 1 and squares are controlled 
by Player 2. States and actions have been annotated with rates and costs. If no cost is given for an action it 
has cost zero. The figure also includes graphs of the value functions. Actions are shown in black and gray, 
and an optimal strategy profile is shown along the a;- axis of the value functions by using these colors - more 
precisely, it is the optimal strategy found by our algorithm. Waiting is shown as white. 

If both players follow the indicated optimal strategies, then the play that starts with state 3 as the current 
state at time 0, is as follows: 

1. At state 3 at time 0, Player 1 waits until time | and then changes the current state to state 2. 

2. At state 2 at time |, Player 2 waits until time | and then changes the current state to state 4. 

3. At state 4 at time |, Player 1 does not wait, but immediately changes the current state to state 3. 

4. At state 3 at time |, Player 1 waits until time 1 and then changes the current state to state 1. 

5. At state 1 at time 1, Player 2 can not wait, and immediately changes the current state to state _L, a 
special state indicating that play has terminated. 

Notice that the play waits in state 3 twice. This may seem like a counter-intuitive property of a play 
where the players play optimally. In fact, the game can be generalized to a family, such that the game with 
n states has a state that is visited 0(n) times in some optimal play. 

The contributions of this paper are the following. 

1. A polynomial time Turing-reduction from the problem of solving general PTGs to the problem of solving 
SPTGs. The best previous result along these lines was a Turing-reduction from the general case to the 
case of "[0,l]-PTGs without resets" by Bouyer et al. [S]. Our reduction is a polynomial time reduction 
reducing solving a general PTG to solving at most (n+ l)(2m+ 1) SPTGs, while the previous reduction 
is an exponential time reduction. 

2. A novel algorithm for solving SPTGs, based on very different techniques than previously used to solve 
PTGs. In particular, our algorithm is based on applications of a technique from computational geom- 
etry: the sweep-line technique of Shamos and Hoey [15] . applied to the linear arrangement resulting 
when the graphs of all value functions are superimposed in a certain way. Also, an extension of Di- 
jkstra's algorithm due to Khachiyan et al. [13] is a component of the algorithm. We believe that 
an implementation of this algorithm and the reduction could provide an attractive alternative to the 
current state-of-the-art tools for solving PTGs or various special cases (e.g., such as those of UPPAAL, 
http : / / uppa al . orgj or HyTech http://embedded.eecs.berkeley.edu/research/hytech/), which 
all seem to be based on a value-iteration based algorithm independently devised by Bouyer, Cassez, 
Flcury, and Larsen [7J; and Alur, Bernadsky, and Madhusudan [T]. We shall refer to that algorithm as 
the BCFL-ABM algorithm. 

1 Players in general cannot guarantee the value exactly, but only approximate it arbitrarily well — one of the particular 
appealing aspects of SPTGs is that they do have exactly optimal strategics! This is in contrast to both the general case and 
[0,l]-PTGs without resets. 
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3. A worst case analysis of our algorithm as well as an improved worst case analysis of the BCFL- 
ABM algorithm. Interestingly, the analysis of the algorithms is quite indirect: We analyze a different 
algorithm for a subproblem (priced games, see section [2]), namely the strategy iteration algorithm, also 
used to solve Markov decision processes and various other classes of two-player zero-sum games played 
on graphs, and relate the analysis of this algorithm to our algorithm. To summarize the result of the 
analysis, it is convenient to introduce the parameter L — L(G) of an SPTG to be the total number of 
distinct time coordinates of left endpoints of the linear segments of all value functions of G. Note that 
the parameter L is very natural, as L is a lower bound on the size of the explicit description of these 
value functions, i.e., the output of the algorithms under consideration. We show: 

(a) For an SPTG G, we have that L(G) < min{12™, Jlfcesd^l + !)}> wner <3 S is the set of states and 
Ak the set of actions in state k. The best previous bound on L(G) was 2°( n \ due to Rutkowski 

(b) The worst case time complexity of our new algorithm is 0((m + nlogn)L). In particular, the 
algorithm combined with the reduction solves general PTGs in time ml2 n n°^ . The best previous 
worst case bound for any algorithm solving PTGs was 2°( n ~ +m \ due to Rutkowski [14], who gave 
this bound for an alternative algorithm, due to him. 

(c) The worst case number of iterations of the BCFL-ABM algorithm is min{12™, rifcesd ^fc 

n O(!) f or general PTGs, significantly improving an analysis of Rutkowsi. (An "iteration" is a 
natural unit of time, specific to the algorithm - each iteration may take considerable time, as 
entire graphs of value functions are manipulated during an iteration) . 

(d) For the special case of PTGs with all rates being 1 (i.e., all states are equally expensive to wait 
in) and all transition costs being (i.e., Player 1 wants to minimize the time used), our algorithm 
combined with the reduction runs in time 0(nm(min(m, n 2 ) + nlogn)). The previously best 
algorithm for solving this special case (called timed reachability games) is an exponential time 
algorithm due to Jurdzinski and Trivedi |12j . 

(e) For one-clock priced timed automata (the special case of priced timed games, where all states 
belong to Player 1), our algorithm combined with the reduction runs in time 0(mn 3 (min(m, n 2 ) + 
nlogn)). This seems to be the best worst case bound known for solving these. 

The above bounds hold if we assume a unit-cost Real RAM model of computation, which is a natural model 
of computation for the algorithms considered (that previous analyses also seem to have implicitly assumed) . 
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The algorithms can also be analyzed in Boolean models of computation (such as the log cost integer RAM), as 
a rational valued input yields a rational valued output. Bounding the bit length of the numbers computed by 
straightforward inductive techniques, we find that this no more than squares the above worst case complexity 
bounds. The somewhat tedious analysis establishing this is not included in this version of the paper. 

1.1 History of problem and related research 

Priced timed automata (or weighted timed automata) were first introduced by Alur, Torre, and Pappas [3] 
and Bchrmann et al. @]. They showed that priced timed automata (viewed as one-player games) can be 
solved in exponential time. Even before the introduction of priced timed automata, a special case was studied 
by Alur and Dill [2] . They show this case to be PSPACE-hard even for automata where all states have rate 

1 and all actions cost 0. Bouyer, Brihaye, Bruyere, and Raskin [5 showed that the problem of solving priced 
timed automata is in PSPACE. I.e., the problem is PSPACE-complete when there is no limit on the number 
of clocks. 

Bouyer, Cassez, Fleury, and Larsen [7] and Alur, Bernadsky, and Madhusudan [1] independently intro- 
duced the notion of priced timed games and also both considered value iteration algorithms for solving priced 
timed games. Finding the value of a priced timed game with many clocks is a hard problem. Even with 
only 3 clocks, finding the value becomes undecidable for priced timed games, as shown by Bouyer, Brihaye 
and Markey [5J. They improved a similar result of Brihaye, Bruyere, and Raskin [5] for 5 clocks. Hence, 
various special cases have been studied. For timed reachability games, Jurdzinski and Trivedi |12j showed 
the decision problem to be in EXP and EXP-complete for 2 or more clocks. 

For the case with only one clock the problem becomes computable, as shown by Brihaye, Bruyre, and 
Raskin [5]. Bouyer, Larsen, Markey, and Rasmussen [8] gave an explicit triple exponential time bound on 
the complexity of solving this problem. This was further improved to 2°' n +m ) by Rutkowski |14j . 

1.2 Organization of paper 

Our algorithm is most naturally presented in three stages, adding more complications to the model at each 
stage. First, in section [21 we show how the strategy iteration paradigm can be used to solve priced games, 
where the temporal aspects of the games are not present. In section [3l we show how the algorithm extends 
to simple priced timed games. Finally, in section 01 we show how solving the general case of one-clock 
priced-timed games can be reduced to the case of simple priced timed games in polynomial time. 

In terms of the list of contributions above, contribution!!]) is Lemma l4~8l The algorithm of contribution^ 
is SolveSPTG of Figure [3J Contribution l3"aj) is Theorem 13.101 contribution I3bp is Theorem l3.111 contribution 
[5c]) is Theorem 14. 101 contribution I3d[) is Theorem 14.111 and contribution l3"c]) is Theorem 14. 121 

2 Priced games 

In this section, we introduce priced games. To accommodate lexicographic utilities which will be necessary 
for subsequent sections, we shall consider priced games with utilities in domains other than R. In this 
section, we fix any ordered Abelian group (5R, +, — ,0, <) for the set of possible utilities. We let 5ft> be 
the set of non-negative elements in 5ft. In subsequent sections, we will either have S = RorR = MxR 
with lexicographic order. In the latter case, we write (x, y) as x + ye, where we informally think of e as an 
infinitesimal. In addition to utilities in the group 5ft, we also allow the utility oo (modeling non-termination) . 

A priced game G is given by a finite set of states S = [n] = {1, . . . , n}, a finite set of actions A = 
[m] = {l,...,m}. The set S is partitioned into Si and S%, with Si being the set of states belonging to 
Player i. Player 1 is also referred to as the minimizer and Player 2 is referred to as the maximizer. The 
set A is partitioned into (Ak)keS, with Ak being the set of actions available in state k. Furthermore, define 
A 1 = Ufegs ^ fc - Each action j G A has an associated non-negative cost Cj £ 5ft>o U {oo} and an associated 
destination d(j) £ 5 U {_!_}, where _L is a special terminal state. Note that G can be interpreted as a directed 
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u(2,a) = 3 



m(4,ct) = 3 




\ u(5, a) = 5 



Figure 2: Example of a priced game and a strategy profile a. 



weighted graph. In fact, a priced game can be viewed as a single source shortest path problem from the 
point of view of Player 1, with the exception that an adversary, Player 2, controls some of the decisions. 

A positional strategy for Player i is a map o~i of Si to A, with cr(k) £ Ay. for each k G S. A pair of strategies 
(or strategy profile) a — (<ri, 02) defines a maximal pa£/i Pfc .o- = (fco, fell • • • )) f rom each fco £ S'Uj-L}, possibly 
ending at _!_, such that d{a{ki)) — fcj+i for all i > 0. Note that a can be naturally interpreted as a map from 
S to A. Let l(k,o~) be the length of Pfc j(T . The path Pk l(7 defines a payoff u(k, a) G SU {00}, paid by Player 
1 to Player 2, as follows: 



I.e., the payoff is the total cost of the path P^^ from k to the terminal state _L, or 00 if Pk,a does not reach 
_L. 

The lower value yfk) of a state k is defined by v(k) = max CT2 min^ u(fc, cti, 02)- A strategy 02 is called 
optimal, if for all states k, we have 02 G argmax^ min CTl u(k, a±, 02). Similarly, the upper value v{k) of 
a state fc is defined by !?(£;) = min CTl max ff2 ui, (T2) and a strategy tri is called optimal if for all k, 
o\ G argmin^ max ff2 u(fc, cr±, 0-2)- Khachiyan o/. [T5] observed that w(fc) = v(k), i.e., that priced games 
have values v(k) :— v(k) ~ v(k). They also showed how to find these values and optimal strategies efficiently 
using a variant of Dijkstra's algorithm. The ExtendedDijkstra algorithm is shown in Figure|31 with v being 
the vector of values. Viewing a priced game as a single source shortest path problem, it is not surprising 
that it can be solved by a Dijkstra-like algorithm. Intuitively, if an arc to be taken by Player 2 would be 
optimal for Player 1, Player 2 will, if possible, do anything else and, informally, "delete" the arc. 

Figure [2] shows an example of a priced game. The round vertices are controlled by Player 1, the minimizer, 
and the square vertices are controlled by Player 2, the maximizer. Bold arrows indicate actions used by a 
strategy profile cr, and dashed arrows indicate unused actions. Actions are labeled by their cost, except if 
the cost is zero. Finally, the states have been annotated by the values. Note that a is an optimal strategy 
profile. 

We say that a\ is a best response to 02 if 01 G argmin . 1 u{k : o~\, 02)1 for all k G S. Similarly, 02 is a 
best response to o~\ if 02 G argmax CT2 uik, o~\, 02), for all k G S. A strategy profile a — (o^o^) is a Nash 
equilibrium if a\ is a best response to 02 , and 02 is a best response to g\ . The following is a standard lemma 
that establishes the connection between Nash equilibria and values of zero-sum games. 

Lemma 2.1 If a = (ctijO^) is a Nash equilibrium, then v{k) = u(k,o~) for all k G S. 




= 00 
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Function ExtendedDijkstra(G) 

(v(±),v(l),...,v(n)) <- (0,oo,..., oo); 
while S ^ do 

(k,j) <- argmin c 3 +v{d{j))\ 

k£S,j£A k 

if k £ Si or \A k \ = l then 

v(k) <- Cj+v(d[j)); 

a(k) <- j; 
|_ S <r- S\{k}; 
else 

L A k <- A k \ {j}; 
return (v, a); 



Figure 3: The ExtendedDijkstra algorithm of Khachiyan et al. |13) for solving priced games. 

Proof: Assume that either o\ or 02 is not optimal. We will show that (171,(72) is not a Nash equilibrium 
for play starting in some state of the game. Assume, without loss of generality, that a\ does not guarantee 
Player 1 the payoff v(k) for play starting at k. There are two cases. 

• Case 1: u(k, 01, 02) < v(k). In this case, Player 2 can deviate from 02 to play a best response to 01 at 
state k. Since o\ does, by assumption, not guarantee Player 1 v(k), this will yield a larger payoff than 
v(k), i.e., the deviation improves payoff for Player 2 and (<7i,<72) is therefore not a Nash equilibrium 
for play starting at k. 

• Case 2: u(k, cri, a%) > v(k). In this case, Player 1 can deviate to play an optimal strategy a\. By 
definition of optimal, this improves his payoff to v{k) and (a\, 02) is therefore not a Nash equilibrium 
for play starting at k. 

□ 

We shall present a different algorithm for solving priced games, following the general strategy iteration 
pattern This algorithm will be extended to priced timed games in the next sections. Let a be a strategy 
profile. For each state k £ S, we define the valuation v(k, a) — (u(fc, a), £(k, a)). I.e., the valuation of a state 
k for strategy profile a is the payoff for k combined with the length of the path P k ,a- If f(k,a) — (00,00) 
we write v(k, a) = 00. We say that an action j £ A k from state k is an improving switch for Player 1 if: 

( Cj +u(d(j),a),l + i(d(j),a)) < v(k,a) 

Where we order pairs lexicographically, with the first component being most significant. I.e., an improving 
switch for Player 1 either produces a path from k of smaller cost or with the same cost and smaller length. 
Similarly, j £ A k is an improving switch for Player 2 if: 

( C j+u(d(j),a),l + £(d(j),a)) > v{k,a) 

Lemma 2.2 Let a = (171,172) be a strategy profile such that for both players i, there are no improving 
switches in A 1 . Then o~± and 02 are optimal. 

Proof: By Lemma 12.11 it is enough to show that (171,(72) is a Nash equilibrium for play starting in each 
state of the game. 

Let i7i be a best response to (72, and let a' — (g'^gi)- Assume, for the sake of contradiction, that 
there exists a fco £ S such that u(ko,a') < u(ko,a). Let fc,; be the z'th state on the path Pk ,a'- Le., 
k i+ i = d(a'(ki)). 
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Either Pk a ,<7' leads to the terminal state JL, or Pu .a' is an infinite path ending in a cycle. The second 
case is impossible since that would imply u(ko, a 1 ) — oo < u(ko, a). 

Since a[ is a best response to o~%, we have u(ki, a') < u(ki, a) for all i. Also, since u(_L, cr') = u(J_, cr) = 
and u(fco,cr') < u(fco,cr), when Pk ,a> leads to the terminal state, there must exist an index i such that 
u(ki+i, a') — u(fci + i,cr) and u(ki,cr') < u(ki,a). Thus, a 1 {ki) is an improving switch for Player 1, and since 
a{ki) ^ a'{ki) we have ki G Si; a contradiction. 

The argument for Player 2 is similar. Let a' 2 be a best response to o~\, and cr' = (ai,a' 2 ). Assume that 
u(ko,a r ) > u(ko,a) for some ko € 5, and let ^ be the i'th state along the path Pk ,a>- For the case when 
Pk ,a' leads to the terminal state, the argument is the same except with < and > interchanged. 

When Pk ,a' is an infinite path ending in a cycle we must have u(ko, cr') = oo > u(ko, a). I.e., u{ki, a) is 
finite for all i. Recall that Cj > for all j E A. For all ki E Si, a(ki) — a'(ki), and, hence: 



Thus, Pk ,a' leads to a cycle, for which the valuations for a decrease with each step; a contradiction. 

□ 

Let B C A be a set of actions such that \B l"l Ak\ < 1 for all k E S, and, for B (1 A k ^ 0, let j(k, B) be 
the unique action in B A k - Let a be a strategy profile, and let cr[B] be defined as: 



If B = {j} we also write cr[j]. If j E A is not an improving switch for one player, we say that j is weakly 
improving for the other player. We say that B C A is an improving set for Player i if there exists an 
improving switch j E B for Player i, and for all j £ B, j is weakly improving for Player i. 

Lemma 2.3 Let a = (cti,(T2) be a strategy profile, and let B C A be an improving set for Player 1. Then 
f(k, o~[B]) < v(k,a) for all k E S, with strict inequality if a[B](ko) is an improving switch for Player 1 
w.r.t. a. Similarly, if B is an improving set for Player 2, then v(k,o~[B]) > v(k,a) for all k E S, with strict 
inequality if cr[B](ko) is an improving switch for Player 1 w.r.t. a. 

Proof: First consider the case where B is an improving set for Player 1. Let fco E S. We must show that 
v(ko,o-[B]) < v(ko,a) with strict inequality if cr[B](ko) is an improving switch for Player 1 w.r.t. a. This is 
clearly true if v{k^,a) = oo. Thus, assume that v(ko,a) < oo. 

Let ki be the ?'th state on the path Pk ,<j[B}- Since a[B\(ki) is weakly improving for Player 1 we have, 
for all i: 



v(ki,cr) = (c CT(fc .) + u(k i+1 ,a),l + £(k i+1 ,a)) > u(k i+l ,a). 



On the other hand, for all ki E S2, o~'(ki) is not an improving switch for Player 2, and, hence: 



f(h,a) > (cv( fei) + u(k i+1 ,a),l +£(k l+1 ,a)) > f(k i+ i,a). 




( c <r[B](fei) +u(k i+ i,a),l +£(k i+ i,a)) < u(ki,a) 

with strict inequality exactly when a[B](ki) is an improving switch for Player 1 w.r.t. a. 
From ([T]), and the fact that Cj > for all j E A, we get that: 



(1) 



v(ki, a) > (c a [ B ](k z ) + u(k i+1 ,a), 1 + £(k i+ i),cr)) 
> (u(k i+1 ,a),£(k i+1 ,a)) 
= v(k i+ i,a). 



Hence, P( 
cycle. 



\ .<y[B] does not lead to a cycle, since the valuations in a can not strictly decrease along the entire 
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Function Strategylteration(G) 

while 3 improving set B\ C A 1 for Player 1 w.r.t. a do 
a <- a[B-i]; 

while 3 improving set B 2 C A 2 /or Player 2 w.r.t. a do 
return (u(cr), cr); 



Figure 4: The Strategylteration algorithm for solving priced games. 

We next show, using backwards induction on i, that v(k il cr[B]) < v{ki, a). For the base case, fcj =_L, the 
statement is clearly true. Otherwise, for i < £(k , cr[B]), we get from (JTJ and the induction hypothesis that: 

v{ki,a) > (c a [ B ](k z ) +u(k i+1 ,a),l+t(k i+ i,o-)) 

> (c amki) +u{k i+1 ,a[B]),l+e(k i+1 ,a[B])) 
= v(ki,a[B]). 

Note that if j G n £> is an improving switch for Player 1, the first inequality is strict. 

The proof for the second case, where B is an improving set for Player 2, is similar. Let fco £ S. We show 
that v{ko,<j[B\) > i/(ko,cr) with strict inequality if o~[B](ko) is an improving switch for Player 2 w.r.t. cr. 
Now, this is clearly true if v(ko,a[B]) = oo. If ^(fco,cr[i?]) < oo, it immediately follows that Pk ,a[B] is of 
finite length. The rest of the proof is identical, but with < and > interchanged. 

□ 

Lemmas 12.21 and 12.31 allow us to define the Strategylteration algorithm as shown in Figure[4] u(a) is 
the vector of payoffs for a. The algorithm is a local search algorithm, and Lemma T2.2I ensures that a local 
optimum is also a global optimum. Player 1 repeatedly performs improving switches while Player 2 always 
plays a best response to the current strategy of Player 1. 

Theorem 2.4 The Strategylteration algorithm correctly computes an optimal strategy profile. 

Proof: It immediately follows from Lemma [2.21 that if the Strategylteration algorithm terminates, it 
correctly computes an optimal strategy profile. Indeed, in order to escape both while-loops neither player i 
can have an improving switch in A 1 w.r.t. a. 

Let a = (o"i, 02) De the current strategy profile at the beginning of the outer while-loop, and let <r[-Bi] = 
(a[,(T2)- From Lemma 12.31 we know that with each iteration of the inner while-loop the valuations are 
non-decreasing, with at least one state strictly increasing its valuation. Since there are only finitely many 
strategies, it follows that the inner while-loop always terminate. Let the resulting strategy profile be a' = 
(cijCr^)' Then a' 2 is optimal for the game where Player 1 is restricted to play according to a' x . I.e., a' 2 is a 
best response to a[. 

After the first iteration a 2 is a best response to o\. Then all actions in A 2 are weakly improving for 
Player 1 w.r.t. cr, and B = B\ U {j G A 2 | 3k S S2 : j — & 2 (k) ^ & 2 (k)} is an improving set for Player 
1. Since a 1 — cr[B] it follows that the valuations are non-increasing and strictly decreasing for at least one 
state from cr to cr'. Again, since there are only finitely many strategies the outer while- loop is guaranteed to 
terminate. 

□ 

3 Simple priced timed games 

A simple priced timed game (SPTG) G is given by a priced game G' = (Si, S 2 , (Ak)kes, ( c j)jeA, d), where 
S = Si U S 2 and A = {J keS Ak, and for each state i £ S, an associated rate Ti € K>o- We assume that 



8 



A k for all k € S. 

A SPTG G is played as follows. A pebble is placed on some starting state fco and the clock is set to its 
starting time x$. The pebble is then moved from state to state by the players. The current configuration of 
the game is described by a state and a time, forming a pair (k, x) e S x [0, 1]. 

Assume that after t steps the pebble is on state k t £ Si, controlled by Player i, at time Xt, corresponding 
to the configuration (k t ,Xt). Player i now chooses the next action j t G Aj, t , Furthermore, the player also 
chooses a delay St > such that Xt+\ — x t + S t < 1. The pebble is moved to d(jt) = fct+i- The next 
configuration is then (k t +i,Xt+i). We write 

(kt,x t ) (k t+1 ,x t+1 ). 

The game ends if k t +i = _L. 

A play of the game is a sequence of steps starting from some configuration (ko,xo)- Let 

/, \ joA\ /, \ ji,Si jt-i,$t-i n x 

p = (k a ,x ) > (ki,xi) > . . . > ik t ,x t ) 

be a finite play such that k t — _L The outcome of the game, paid by Player 1 to Player 2, is then given by: 

t-i 

cost(» = ^(S e r ke +c n ). 

£=0 

I.e., for each unit of time spent waiting at a state k Player 1 pays the rate r k to Player 2. Furthermore, 
every time an action j is used, Player 1 pays the cost Cj to Player 2. If p is an infinite play the outcome is 
oo, and we write cost(p) = oo. 

A (positional) strategy for Player i is a map 7ij : Si X [0, 1] —> A U {A}, where A is a special delay action. 
For every k € Si and x € [0, 1), if %i(k, x) = A then we require that there exists a 6 > such that for all 
< e < 5, iTi(k,x + e) = A. Let S Vi (k,x) = inf{x' — x \ x < x' < l 1 n i (k,x r ) ^ A} be the delay before the 
pebble is moved when starting in state k at time x for some strategy 7Tj. 

Player i is said to play according to 7Ti if, when the pebble is in state k G Si at time x £ [0, 1], he waits 
until time x' = x + S- 7Ti (k, x) and then moves according to Tti (fc, x'). A strategy profile ir — (jti ,112) is a pair of 
strategies, one for each player. Let IL be the set of strategies for Player i, and let II be the set of all strategy 
profiles. A strategy profile tt is again interpreted as a map tt : S x [0, 1] — > ^4U{A}. Furthermore, we use ir(x) 
to refer to the decisions at a fixed time. I.e., ir(x) : S — > A U {A} is the map defined by (ir(x))(k) = Tr(k,x). 

Let p\ x be the play starting from configuration (k, x) where the players play according to tt. Define the 
value function for a strategy profile tt = (tti,^) and state k as: v'^ 1 ^ 2 {x) = cost(p^ a; ). For fixed strategies 
tti and 7T2 for Player 1 and 2, define the best response value functions for Player 2 and 1, respectively, for a 
state k as: 

v^ix) = sup vl u7T2 {x) 
7r 2 en 2 

inf vl^(x) 

7Tierii 

We again define lower and upper value functions: 

v k (x)= sup vl 2 {x)= sup inf < 1,7r2 (x). 
7r 2 en 2 7r 2 en 2 Tieni 

^fc(a^) = inf u fc 1 ( a; )= m f sup w^ 1,7r2 (x). 
7rierii u-ierii 7r2 gn 2 

Note that inf and sup are used because there are infinitely many strategies. Bouyer et al. [8] showed that 
v k (x) = Vk(x). In fact, this was shown for the more general class of priced timed games (PTGs) studied in 
Section[U Thus, every SPTG has a value function v k {x) := v_ k (x) = v k {x) for each state k. 
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A strategy 71^ 6 IL, is optimal from time x for Player i if: 

Vfc eS,i'e [x,l] : wJ'Ca/) = v k (x'). 

Strategies are called optimal if they are optimal from time 0. Similarly, a strategy TTj is a 6es£ response to 
another strategy 7r_i from time x if: 

Vfc eS,i'e [x, 1] : ^'"-V) = vl-\x'). 

A strategy profile (711,772) is called a A^os/i equilibrium from time x if tti is a best response to 7T2 from 
time x, and 7T2 is a best response to tt\ from time x. As in the case of Lemma 12.11 for priced games, any 
equilibrium payoff of an SPTG is the value of the game. The exact statement is shown in Lemma |3. II Since 
the argument is standard, and similar to the proof of Lemma l2.1l it has been omitted. Just note that instead 
of considering best responses, which we have not yet showed exist for SPTGs, it suffices to use some better 
strategy. 

Lemma 3.1 If there exists a strategy profile (711,712) that is a Nash equilibrium from time x, then Vk(x') = 
< 1,7F2 (x') for alike S and x' € [x, 1] . 

The existence of optimal strategies and best replies is non-trivial. We are, however, later going to prove 
the following theorem, which, in particular, implies that inf and sup can be replaced by min and max in the 
definitions of value functions. (The second half of the theorem also holds for general PTGs and was first 
established by Bouyer et al. [5] who furthermore showed that the first half fails for general PTGs.) 

Theorem 3.2 For any SPTG there exists an optimal strategy profile. Also, the value functions are contin- 
uous piecewise linear functions. 

Our proof will be algorithmic. Specifically, the algorithm SolveSPTG computes a value function of the 
desired kind. Furthermore, the proof of correctness of SolveSPTG (the proof of Theorem 13. lip also yields 
the existence of exactly optimal strategies. 

We refer to the non-differentiable points of the value functions of G as event points of G. The number of 
distinct event points of G is an important parameter in the complexity of our algorithm for solving SPTGs. 
We denote by L(G) the total number of event points, excluding x = 1. 

Remark 3.3 Let us remark that strategies are commonly defined as maps from states and times to delays 
and actions. For instance, Tj : Si x [0, 1] — > [0, 1] x A. This is more general than our definition of strategies, 
since Tj(fc, x) = (8, a) with 8 > does not imply that for all x' £ (x, x + 8] we have Tj(fc, a;') = (x + 8 — x', a), 
whereas this implication holds for the strategies we use. We choose to use the specialized definition of 
strategies because it offers a better intuition for understanding the proposed algorithm. It is easy to see that 
the players can not achieve better values by using the more general strategies. Indeed, let Tj be some strategy 
where Tj(fc, x) = (8, a) and Tj(fc, x') — (8', a'), such that [x, x + 8]D [x', x' + 8'} ^ 0. Then one of the following 
two modifications will not make r, achive worse values: T{(k,x) = (x' + 8' — x, a') or Ti(k,x') = (x + 8 — x',a). 

3.1 Solving SPTGs 

In order to solve an SPTG we make use of a technique similar to the sweep-line technique from computational 
geometry of Shamos and Hoey |15j . Informally, we construct the value functions by moving a sweep-line 
backwards from time 1 to time 0, and at each time computing the current values based on the later values. 
The approach is also similar to a technique known in game theory as backward induction. The parameter 
of the induction, the time, is a continuous parameter, however. The BCFL-ABM algorithm also applies a 
backward induction, but there, the parameter of induction is the number of transitions taken, i.e., a discrete 
parameter, leading to a value iteration algorithm. The formal development of the algorithm follows. 

If 7r is a strategy profile that is optimal from time x, we use 7r to construct a new strategy profile it' that 
is optimal from time x' < x. More precisely, for e > sufficiently small, we show that there exists a fixed 
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optimal action for all states for both players for all point of time in the interval [x' ,x), where x' = x — e. 
The new profile n' is then obtained from tt by using these actions. If the value at time x is known, and the 
strategies do not change in the interval [x — e,x), then v k (x — e) = v k (x) + er k if the players wait at state 
k. The optimal actions can then be found by solving a priced game where waiting is associated with the 
resulting cost. 

Definition 3.4 For a given SPTG G = (Si, S2, (A k ) ke s, c , d, r) and a time x G (0,1], let the priced game 
G x -y = (S 1 ,S 2 ,(A k ) keS ,c x ,d') be defined by: 

VkeS : A' k 
Vj G A' k : 

VjeA' k : d'(j) 

We will usually let y be the infinitesimal e, in which case we will simply denote G xt by G x and c x ' e by c x . 

The additional actions X kl for k G S, should be interpreted as the additional option of waiting in the 
SPTG G. Note that the only difference between G x and G x , for x ^ x', is the costs of the added actions 
Afe. Hence, we may interpret a strategy profile a for G x as a strategy profile for G x . Also note that G x 
is identical to the priced game G' defining G, except that for each state k there is an additional action X k 
corresponding to waiting in that state in the SPTG. Slightly abusing notation, we will interpret actions 
chosen by a as also being actions tt(x) for G, and the actions w(x) as forming a strategy profile for G x . 

We will also sometimes write u{k, a, G x ) instead of u(k, a) to clarify which priced game G x we consider. 
Since e is an infinitesimal, the payoffs of a strategy profile a have two components, and we let u(k, a, G x ) = 
a(k,a,G x ) + eb(k,a,G x ). Note that a(k,cr,G x ) = u(k,a,G xfi ). Furthermore, for every x e (0,1], let 
v x (k) = a x (k) + eb x (k) be the value of state k in G x , and let a x = (c^of) be an optimal strategy profile. 

Let x E (0,1], let a be a strategy profile for G x , let k n be a state, and assume that u(k 0l o- 1 G x ) = 
a(ko,o-,G x ) + eb(k ,a 7 G x ) < 00. We then define r(ko,a) — b(k ,cr,G x ) to be the rate of the waiting state 
reached from kg when players play according to a. More precisely, we have Pk Q ,a = (fco, k\, . . . ,k t ) where 
k t =_L, and it will either be the case that a(kt-\) = Afe t l , or that the last action taken was part of the 
original game. The only actions whose costs have an infinitesimal component are the additional actions X k , 
for k G S. In particular, such infinitesimal costs exactly correspond to the rates of states in G. Hence, in 
the first case we have r(ko, a) = rh t _ 1 , and in the second case we have r(ko, cr) = 0. In both cases r(ko, a) 
can be interpreted as the actual rate that will be paid for waiting in the original game. I.e., if we reach _L 
before time 1 the rate of waiting there is 0. 

Lemma 3.5 Let n be a strategy profile for G that is optimal from time x, and let x' < x. If tt(x") = a x for 
all x" G [x',x), then u^(ar') = v k {x) + (x - x')b x (k) for all k G S. 

Proof: Let p k x , — (fc ,x ) • io ' S "y (fci,xi) • 7 ' 1 ' fl > . . . , and let t be the maximum index such that x t < x. 
Since ir(x") — a x for all x" G [x',x), we have Se — for all I < t and 5 t > x — x'. By splitting the cost of 
p\ x , into cost accumulated before and after time x, we get: 

v%{x') = cost(p^ ;X ,) 

= I (x - x')r kt + c n + v k t (x) 

\ i=o J 

= a(k, a x ,G x ) + (x - x')b(k, a x ,G x ) 
= a x (k) + (x-x')b x (k) . 



= A k U {X k } 

_ iv k (x) +yr k if j = X k 
[ cj otherwise 

f-L if j = A fe 
1 d(j) otherwise 
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It remains to show that a x (k) = Vk{x). Recall that a x (k) = a(k,a x ,G x ) = u(k, a x ,G X, °). Observe that 
since a x is optimal for G x it must also be optimal for G x '°. Indeed, if a better value can be achieved in G x0 
then, regardless of the infinitesimal component of the payoff achieved by a x , it will also be better for G x . 
Furthermore, the value of a state k in G x must be consistent with Vk(x). It follows that u(k,<r x ,G x -°) = 
v k (x). 

□ 

Recall that a is optimal for G x when there are no improving switches w.r.t. a. Hence, if < y < 1 is 
the maximum number for which there are no improving switches w.r.t. o~ x in G x,y , for all y' G (0,y], then 
a x is optimal for all such G xy . In fact, we will see that x' = x — y is the next event point preceding x. For 
every action j G A and time x G (0, 1], define the function: 

f jiX (x") = Cj + a x (d(j)) + b x (d(j))(x - x"). 

Note that if fj, x {x") < fa*(k), x ( x "), f° r k E Si and j E Ak, then j is an improving switch with respect to 
a x in G x ' x ~ x for Player 1. Define NextEventPoint(G a: ) as: 

max {0} U {x' E [0,x) | 3k G S,j G A k : f jtX (x) ^ U(k),x{ x ) A foA x ') = U(k),x{ x ')}- 

Note that NextEventPoint(G a: ) is well-defined, since there is only one function fj iX for each action j G A. 

Lemma 3.6 Let x' = NextEventPoint(G :l: ) 7 then a x is optimal for G x ' y , for all y G (0,x — x'}. 

Proof: As discussed prior to stating the lemma, a x is optimal for G x,y if neither player i has an improving 
switch j G A 1 w.r.t. a x . Per definition there are no improving switches when y is sufficiently small. Recall 
that the valuation u(k, a x , G x - y ) — (u(k,a x ,G x ' y ),£(k,cr x ,G x - y )) consists of two components. Although the 
payoffs u(k, o- x ,G x - y ) change for different y, £(k, a x ,G x - y ) remains the same. Thus, we only need to consider 
for which payoffs there is an improving switch w.r.t. <j x . 
For G x y we have, for all k G S: 

u{k, o x ,G x - y ) = c x J {k) + u{d{o- x {k)),<j x ,G x > y ) 
Then j G A k , where k G Si, is an improving switch for Player 1 w.r.t. a x if and only if: 

c j +u(d(j),o*,G x >») < u{k,o- x ,G x > y ) = c x J ik) +u(d(a x (k)),a x ,G x - y ) ^ 

fj,x{x -y)< fa*(k),x{x - y) 

The same holds for Player 2 with reversed inequalities. The maximum x' < x for which there is an improving 
switch j w.r.t. a x in G x appears when the lines defined by f^ x and / CT *(fe) lX intersect. If fj iX = / CT *(/c) lX , 
j never becomes an improving switch, however. NextEventPoint(G :l: ) exactly equals such an intersection 
point, and possibly if there is none. Hence, a x is an optimal strategy for G x ' y for all y G (0,a; — x'], where 
x' = NextEventPoint(G a: ). 

□ 



Lemma 3.7 Let x' = NextEventPoint(G :r ), and let ir — (ni,^) be a strategy profile that is optimal from 
time x. Then the strategy profile n' — (ir'i,^), defined by: 



a x {k) ifx" G [x',x) 
ir(k,x") otherwise 



ir'(k,x") = 

is optimal from time x' , and Vk(x") = Vk(x) + b x (k)(x — x"), for x" G [x',x) and k G S. 
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Proof: Let us first note that for any strategy profile 7r", the outcome v% g (xq), for some starting configuration 
(ko,xo), only depends on the choices made by tt" in the interval [xq, 1]. Hence, since tt' is the same as tt in 
the interval [x, 1], tt' is also optimal from time x. 

Let us also note that Vk(x") = oo for some k S S 1 and x" £ [0, 1] if and only if Vk(x") = oo for all 
x" G [0, 1], and v x (k) — oo for all G x . Indeed, the value is infinite exactly when the play has infinite length, 
and this property is independent of time. Hence, costs and rates are of no importance. Vk(x") is, thus, 
correctly set to oo if Vk(x) — oo. Also, a x (k) achieves the correct value for k. For the remainder of the proof 
we focus on the case where Vk{x) < oo. It follows immediately from Lemma 13.51 that the value function has 
the correct form in the interval [x',x). I.e., v% (x") = Vk(x) + b x (k)(x — x"), for x" 6 [x',x). 

To finish the proof we must show that the choices of a x are, indeed, optimal in the interval [x' , x). We 
first prove that there exists a maximum time x" such that tt' is optimal from time x" . Since tt' is optimal 
from time x, there must exists a maximum time x" < x, such that i>£ (x) — Ufc(x) for all x > x" and all 
states k. Assume for the sake of contradiction that u£ (x") ^ Vk(x"), for some state k. Since the choices 
of tt' are the same throughout the interval [x',x), there must be a player that can do better in the time 
immediately after x" , and we get a contradiction. Hence, tt' is optimal from time x" . 

From Lemma l3.1l we know that it suffices to show that (71^, tt' 2 ) is a Nash equilibrium from time x' . Assume 
for the sake of contradiction that there exists a strategy tt", a state ko, and a time xq £ [x',x), such that 

v *i >7r 2( x ") < vl 1 ^ 2 (x"). Consider the finite play p^^l = ( fc o,^o) j "' 5 °'> (ki,xi) jl ' 5l > . . . Jt ~ 1 ' l5f ~ 1 > (k t ,x t ), 
and assume for simplicity that, if xt > x, a configuration appears at time x. Let £ > be the minimum 
index such that v^ 1 ^ 2 (x^) = v^ 1 ^ 2 (xe>) for all £' > I. Note that since the play is finite, meaning that k t 
is the terminal state, equality holds for £' = t. Hence, £ is well-defined. Also, since (71^,712) is optimal from 
time x, and x appears in a configuration of the play if xt > x, we must have xi < x. Furthermore, since 
there exists a time x" £ [x',x) from which tt' is optimal, we may assume without loss of generality that 
(711,712) is optimal from time xt. If this is not the case we may start with a later time Xq £ [xe,x), possibly 

with the other player. In particular, we have v^ 1 '™ 2 (xi) = v^ 1 ' 7 ' 2 (xi) = Vk(xt), for all states k. 

Let Xi be defined as above, and consider the previous transition (je-i, Sg—i). We may view this transition 
as two steps: first wait at fc^_i for Si-\ time, and then use action je-i- From the definition of £ we know 

that v^y^ 2 {xi-i) < Vf. 1 '** (xe-i). Since tt' is optimal from time xt the decrease must have occured while 
waiting. 

We thus have v^^(xt) = (x^), but w^ ^ 2 (a;^_i) < v^ 2 (xt-i). Observe that 7r"(fcf_i, x") = X 

for all x" £ [xi-i,xt). It follows that: 

vZ£'{xt) < v^(x t -e). (2) 

Hence, X] ee _ 1 is an improving switch w.r.t. a x in G xe . On the other hand, since a x is optimal for G x ' x ~ xt it 
is also optimal for G xe '°. Hence, xi = NextEventPoint(G ;r ), and we get a contradiction from the fact that 
x° < x l . 

The case for Player 2 is identical. 

Let us note that the implication in @ does not easily work for the more general strategies described in 
Remark O □ 

Lemma 13.71 allows us to compute optimal strategies by backward induction once the values Ufc(l) at 
time 1 are known for all states k S S. Finding Ufc(l) and corresponding optimal strategies from time 1 is, 
fortunately, not difficult. Indeed, when x = 1 time does not increase further, and we simply solve the priced 
game G' that defines G. The resulting algorithm is shown in Figure [5j Note that the choice of first using 
the ExtendedDijkstra algorithm and then the Strategylteration algorithm is to facilitate the analysis 
in Section I3~2l In fact, any algorithm for solving priced games could be used. By observing that SolveSPTG 
repeatedly applies Lemma 13.71 to construct optimal strategies by backward induction we get the following 
theorem. 
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Function SolveSPTG(G) 



(v(l), (tti(1), tt 2 (1))) <r- ExtendedDijkstra(G'); 
x <- 1; 

while x > do 

(a x (k) + eb x (k), (oj., cr 2 )) <— Strategylteratio^G 1 , (7r 1 (a;), 7r 2 (a;))); 
cc' «- NextEventPoint(G a: ); 
forall k E S and x" E [a;',^) do 

Ufc(a:") <-« fc (i) + 6-(fc)(a:-a;"); 
7Tl(fc,x") <- fii(fc); 
|_ 7T 2 (fc,x") <- er 2 (fc); 
x x'\ 
return (v, (tti, 7r 2 )); 



Figure 5: Algorithm for solving a simple priced timed game G = (G', (rfe)fces)' 

Theorem 3.8 //SolveSPTG terminates, it correctly computes the value function and optimal strategies for 
both players. 

Note that SolveSPTG resembles the sweep-line algorithm of Shamos and Hoey [15] for the line segment 
intersection problem. At every time x we have n ordered sets of line segments with an intersection within 
one set at the next event point x' = NextEventPoint(G a: ). When handling the event point, the order of the 
line segments is updated, and we move on to the next event point. 

3.2 Bounding the number of event points 

Let G be an SPTG. Recall that the only difference between G x and G x , for x ^ x', are the costs of actions 
Afc, for k E S, if Vk(x) ^ Vk(x'). The actions available from each state are therefore the same, and a strategy 
profile a for G x can, thus, also be interpreted as a strategy profile for G x . To bound the number of event 
points we assign a potential to each strategy profile a, such that the potential strictly decreases when one 
of the players performs a single improving switch. Furthermore, the potential is defined independently of 
the values Vk{x). It then follows that the number of single improving switches performed by the SolveSPTG 
algorithm is at most the total number of strategy profiles for G x . We further improve this bound to show 
that the number of event points is at most exponential in the number of states. This improves the previous 
bound by Rutkowski [14] . 

Let n be the number of states of G, let N be the number of distinct rates, including rate for the terminal 
state _!_. Assume that the distinct rates are ordered such that r\ < r 2 < • • • < rjy. Recall that r(k 7 a) is the 
rate of the waiting state reached from k in a. Let 

count(cr,i,£,r) = \{k E Si \ l{k,a) — £ A r(k,a) = r}\ 

be the number of states controlled by Player i at distance £ from _!_ in a that reach a waiting state with rate 
r. 

For every strategy profile a for the priced games G x , for x E (0, 1], define the potential P(a) E N™ xAr as 
an integer matrix as follows. 

P{v)t,r = count(<r, 2, £, r) — count(cr, 1, 1, r) 

I.e., rows correspond to lengths, columns correspond to rates, and entries count the number of corresponding 
Player 2 controlled states minus the number of corresponding Player 1 controlled states. 

We define a lexicographic ordering of potential matrices where, firstly, entries corresponding to lower rates 
are of higher importance. Secondly, entries corresponding to shorter lengths are more important. Formally, 
we write P(cr) -< P{o~') if and only if there exists I and r such that: 
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Figure 6: Example of potential matrices of the strategy profiles from Figure [TJ 



• P{v)t',r> = P{o')t>y for all r' < r and 1 < £' < n. 

• P{(r)e,r = P(v')e>,r for all £' < t. 

• P{a)t, r < P(a')v- 

Figure |5] shows an example of the potential matrices of the strategy profiles shown in Figure [TJ We use 
the following notation: 

• er^ 1 ) is the strategy profile used at time x = 1, 

• cj( 2 ) is the strategy profile used at time x £ [2/3, 1), 

• is the strategy profile used at time x £ [1/3, 2/3), 

• and cr' 4 ' is the strategy profile used at time x £ [0, 1/3). 

er' 1 ) is also shown in Figure[3J Observe that P(a^)i_i = because states 1 and 5 are controlled by Player 2 
and 1, respectively, and both move directly to _L, which has rate 0. Also note that the potentials do indeed 
decrease for the four matrices. At each event point the strategies are updated for multiple states, however. 

Lemma 3.9 Let a be a strategy profile that is optimal for G x '°, for some x £ (0,1]. Let j £ A 1 be an 
improving switch for Player i w.r.t. a in the priced game G x . Then P(a[j]) -< -P(ct). 

Proof: Consider the game G x . Recall that for every strategy profile a' and state k £ S,we let u(k, a' , G x ) — 
a(k,a' ,G X ) + er(k,a'), where e is an infinitesimal. We also have u(k,a',G x '°) = a(k,a' ,G X ). Since a is 
optimal for G X,Q we must have a(k,a) = a(k,a[j]) for all k £ S. Indeed, otherwise j would also be an 
improving switch w.r.t. a in G x '°, implying that a is not optimal for G x '°. 

Let k be the state from which the action j originates. It then follows that u{k 1 a) ^ oo and u(fc, o~[j]) ^ oo. 
I.e., it is not possible for exactly one of the payoffs to be infinite, and if both payoffs are infinite then j would 
not be an improving switch. 

Assume that i = 1. Since j £ is an improving switch for Player 1 we have v(k,a[j\) < u(k,a). It is, 
thus, either the case that r(k,a[j}) < r(k,a), or that r(k, cr[j]) = r(k, a) and £(k,a[j}) < £(k,a). In both 
cases the most significant entry £, r for which P{a)e t r ^ P{o~[j])e t r is £ = £(k, cr[j]) and r = r(fc, c [?']). Indeed, 
all states with new valuations in o~[j] move through state k and, thus, have same rates but larger lengths. 
Since i = 1 we have P(cr)f jr < P{o-[j])i>^ and, thus, P(a[j}) ~< P{a). 

The case for i — 2 is similar, j £ A^ is an improving switch for Player 2, implying that either r(k, o~[j]) > 
r(fc,cr), or r(k,o~[j]) — r{k,a) and £(k,o~[j]) > £{k,a). The most significant entry £,r for which P(cr)i tr ^ 
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P(a\j\)i r is then t = £(k, a) and r = r(k, a). Since i — 2 we again have P{cr)e r < P{ a [j\)i r and subsequently 
P(a\j})UP(a). 

□ 

Theorem 3.10 The total number of event points for any SPTG G withn states is L(G) < min{12™, J^ fc6S (|^4fc| + 
1)}. Furthermore, if there is only one player, L{G) = 0(n 2 ). 

Proof: Consider the variant of the SolveSPTG algorithm where Strategylteration only performs single 
improving switches for both players. I.e., when solving G x , for some x € (0,1], Player 1 performs one 
improving switch, then Player 2 repeatedly performs single improving switches as long as possible, and then 
the process is repeated. The resulting optimal strategy profile a x is then used as the starting point for 
solving the next priced game G x , for x' = NextEventPoint(G 2: ). 

Once the initial strategy profile a ~ (7Ti(l), 7^(1)) is found, any strategy profile a' that is subsequently 
produced by the Strategylteration algorithm at some time x is optimal for the priced game G x, °. I.e., 
a x is optimal for all G x with x" € (x',x], where x' = NextEventPoint(G ;r ). In particular, the payoffs 
resulting from a x and a x in G x only differ by some second order term. Hence, we can apply Lemma 13.91 
to the strategy profiles, and conclude that the potential decreases with every improving switch. From this 
we immediately get that the total number of strategy profiles in G x , rifces(l^ fc l + l)i is an upper bound on 
L(G). 

We next show that L(G) < 12". A matrix P G f$ nxN corresponding to a legal potential can always be 
constructed in the following way. Let each entry {£, r) be associated with a set SV,r of corresponding states. 
I.e., Si^ r contains the states for which it takes £ moves to reach _L in the priced game, and the rate encountered 
is the r'th smallest rate of the game. Pick a non-empty subset of the columns C C {1, . . . , N}. This will 
be the columns, such that in column r, there is an £ such that count(cr, 2, £, r) ^ or count (<r, 1, £, r) ^ 0. 
This can be done in at most 2 N — 1 < 2™ +1 ways. Next, assign states to the sets of the entries. If Si :r ^ 0, 
then we must also have SV,r ^ for all £' < £, by definition. This allows us to assign states to sets in an 
ordered way. Let {£, r) be the current entry starting from £ — 1 and r = min C . The current entry will be 
lexicographic increasing in (r, £). Repeatedly add a state from either Si or S2 to Se, r and update the current 
entry in one of the following three ways: 

• Do nothing: More states will be assigned to Sg_ r . 

• Move to the next row: No more states will be assigned to St >r , but some will be assigned to Se^ r +±. 

• Move to the beginning of the next column of C: No more states will be assigned to Siy for any r'. 

There are n (one for each state in the game) such iterations, and in each iteration there are at most six 
possible options. Hence, the states can be added in at most 6™ ways. Furthermore, we do not need to update 
the current entry after the last state has been added, which saves us a factor of 3. The total number of 
possible matrices P is, thus, at most 12". 

When there is only one player i the argument becomes much simpler. Observe that the rates change 
monotonically when going back in time: if i = 1 the rates decrease, and if i = 2 the rates increase. 
Furthermore, at every event point at least one state changes rate. Hence, there can be at most nN < n(n+ 1) 
event points. 

□ 

Theorem 3.11 SolveSPTG solves any SPTG G in time 0(m ■ min{12", Ilfcesd^ I + 1)}) in the unit cost 
model, where n is the number of states and m is the number of actions. Alternatively, the variant of 
SolveSPTG that uses the ExtendedDi jkstra algorithm instead of Strategylteration solves G in time 
0(L(G){m + nlogn)). 

Proof: The correctness of SolveSPTG follows from Theorems 13.81 and 13. 101 
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For the first bound we get from the proof of Theorem 13.101 that, in fact, not only the number of event 
points, but also the number of single improving switches is bounded by min{12™, Il/tes 1-^*1}' Valuations for 
a strategy profile a can be computed in time 0(n), and then the next event point can be computed in time 
0(m). I.e., for each k G S we find the next event point at time x among the intersections of fcr(k),x an d fj,x, 
for j e A k . 

For the second bound we are using the ExtendedDijkstra algorithm of Khachiyan et al. [13 instead of 
Strategylteration in the inner while-loop. The ExtendedDijkstra algorithm has the same complexity as 
Dijkstra's algorithm^. Fredman and Tarjan [TU] showed that, using Fibonacci heaps, Dijkstra's algorithm can 
solve the single source shortest path problem for a graph with n vertices and m edges in time 0(m + n log n) . 

□ 

Theorem 13.21 follows as a corollary of Theorem 13.111 since SolveSPTG is always guaranteed to compute 
optimal strategies, and the resulting value functions are continuous piecewise linear functions. 

4 Priced timed games 

One-clock priced timed games (PTGs) extend SPTGs in two ways. First, actions are associated with time 
intervals during which they are available, and second, certain actions will cause the time to be reset to zero. 
Also, we do not require the time to run from zero to one. 

Formally, a PTG G can be described by a tuple G = (Si, S 2 , {A k ) kf z S , (cj)jeA,d, (r k )kes, (Ij)jeA, R), 
where S — Si U S2 and A — \J keS A k . The complete description of the individual components of G is as 
follows. Note that only the last two components are new compared to priced games and SPTGs. 

• S{, for i G {1, 2}, is a set of states controlled by Player i. 

• A k , for k G S, is a set of actions available from state k. 

• Cj G K>o U {00}, for j G A, is the cost of action j G A. 

• d : A — > S U {J-} is a mapping from actions to destinations with _!_ being the terminal state. 

• r k G K>o, for k G S, is the rate for waiting at state k. 

• Ij, for j G A, is the existence interval (a real interval) of action j during which it is available. 

• R C A is the set of reset actions. 

To simplify the statements of many of the remaining lemmas we let (n, m, r, <i)-PTG be the class of all 
PTGs consisting of n states, r of which are the destination of some reset action, m actions and d distinct 
endpoints of existence intervals. 

We let e(I) be the set of endpoints of interval /, and define M = max U eA e(Ij). I.e., after time M no 
actions are available and the game must end. Note that PTGs are often defined with existence intervals for 
both states and actions. For convenience, we decided to omit this feature since it is not difficult to translate 
between the two version. 

PTGs are played like SPTGs with the exception that using a reset transition resets the time to zero and 
that the actions must be available when used. We, thus, again operate with configurations (k, i)eSx [0, M] 
corresponding to a pebble being placed on state k at time x. The player controlling state k chooses an action 
j G A k and a delay 5 > 0, such that j is available at time x + S. I.e., x + 5 G Ij. We assume for simplicity 
that such an action is always available. The pebble is then moved to state d(j), the time is incremented to 
x + 5 if j $ R and reset to zero otherwise, and the play continues. The game ends when the terminal state 
_L is reached. 

2 To get this bound for the Extended Dijkstra's algorithm, actions of the maximizer should not be inserted into the priortiy 
queue. Instead, a choice of action for the maximizer for a state is fixed when the values of all possible successors of that state 
are known. 
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We again let a play be a sequence of legal steps starting from some configuration (ko,Xo): 

p = {k ,x ) > (ki,xi) > . . . 

where, for all I > 0, xi + 5? G Ij e , and if je G R then xp + \ = 0. The costs of infinite plays and finite plays 
ending at the terminal state _L are defined analogously to SPTGs. 

Let Plays(z) be the set of finite plays ending at a state controlled by Player i. Note that p G Plays(i) 
specifies the current state and time, as well as the history leading to this configuration. A (positional) 
strategy for Player i is again defined as a map Wi : Si x [0, M] — > A U {A} from configurations of the game 
to choices of actions. Again, for every k G Si and x G [0, 1), if 7Tj(fc, x) — X then we require that there exists 
a 5 > such that for all < e < 6, iTi(k,x + e) = A. Let 5 Vi (k,x) = inf {a;' — x | x < x' < l,TTi(k,x') / A} 
be the delay before the pebble is moved when starting in state k at time x for some strategy Previous 
works have defined such strategies in other ways, see Remark EP1 

A history- dependent strategy for Player i is a map Tj : Plays(z) — > (A, K>o) that maps every play p ending 
in a state k € Si to an action j G Ak and a delay t. We will only use history-dependent strategies in the 
proof of one lemma (Xemma 14. 2[) . Note that history-dependent strategies generalize positional strategies. 
We denote the set of history-dependent strategies for Player i by Tj(G), where G is omitted if it is clear from 
the context. Similarly, the set of positional strategies for Player i is denoted by Ui(G). 

Let pJ}'J 2 be the play generated when, starting from (k,x), the players play according to n and T2- The 
corresponding value function is again defined as: 

vftx)=cost(p™). 

Best response, lower and upper value functions are again defined as: 



v£(x) 


= sup 

t 2 £T 2 


v™(x) 










= inf 


v™(x) 










= sup 

r 2 GT 2 




sup 

t 2 £T 2 


inf 




v k (x) 


= inf 


VT ki x ) = 


inf 

tiGTi 


sup 

T 2 eT 2 





Bouyer et al. [8] proved the following fundamental theorem. 

Theorem 4.1 (Bouyer et al. |8j) For every PTG G, there exist value functions Vk{x) := v k (x) = Vk(x). 
Moreover, a player can get arbitrarily close to the values even when restricted to playing positional strategies: 

v k (x)= sup inf vl u7T2 (x) = sup inf u£ I,7r2 (x) 
7r 2 en 2 tiGTi 7r 2 en 2 fieiii 

tJfc(a;) = inf sup ^'^(x) = inf sup v 1 k Tl ' 7 ' 2 (x) 
TiieUi T2£ T 2 Trieni 7rae n 2 

For the purpose of solving PTGs it, thus, suffices to consider positional strategies. In the remainder of 
this section we will therefore restrict ourselves to positional strategies unless otherwise specified. 
A strategy 7r^ G IL is e-optimal for Player i for e > if: 

Vfc G S, x G [0, M] : \v^(x) - v k {x)\ < e. 

Since PTGs have value functions, e-optimal strategies always exist for both players, for any e > 0. Optimal 
strategies do not always exist, as shown by Bouyer et al. [8]. Indeed, consider the PTG shown in Figure [7] 
State 1 is controlled by Player 1, the minimizer, and state 2 is controlled by Player 2, the maximizer. The 
value functions are shown on the right. Two actions leading to the terminal state are available from state 2 
at time and 1, respectively. Since the rate of state 2 is 0, Player 2 picks the more expensive action with 
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X = 1, C3 = 





^ _ I 1 if x = 
1 otherwise 

Figure 7: Example of a PTG with no optimal strategy profile. 



cost C2 = 1 at time 0, and at other times Player 2 waits until time 1 and picks the cheaper action with cost 
C3 = 0. From state 1 exactly one action is available at all times, and since the rate is 1, Player 1 leaves the 
state as soon as possible, only not at time 0. Since no strategy can implement leaving as soon as possible 
there is no optimal strategy for Player 1. More precisely, for every waiting time 5 chosen by Player 1 at time 
0, there exists a smaller waiting time 5' < 5 that achieves a better value. 

We reduce solving any PTG to solving a number of SPTGs. The first step towards this goal is to remove 
reset actions by extending the game. 

Lemma 4.2 Let G be a (n,m,r,d)-PTG. Solving G can be reduced to solving r + 1 (n,m 7 0,d)-PTGs. 

Proof: Let 7r = (7ri,7T2) be any strategy profile, and suppose the play p\ x is using two reset actions 
j,j' £ R leading to the same state d(j) = d(j') = k. Then the configuration (fc, 0) appears twice in 
Pk x > an< ^ smce strategies are history-independent it appears an infinite number of times. It follows that 
v£ o (xo) = 00. By the pigeonhole principle we get that if a play p ko Xq uses r + 1 reset actions, then some 
state is visited twice by some reset actions, and therefore v% o (xo) = 00. 

Thus, when playing G we may augment configurations by the number of times a reset action has been 
used, and once this number reaches r + 1 we may assume without loss of generality that the value is infinite. 
This defines a new PTG G' with states S' = S x {0, . . . , r} and actions A' = A x {0, . . . , r} in the following 
natural way. For j £ A and £ £ {0, . . . , r}, destinations and costs are defined as 

{d(j),£ + l) if j £ R &nd£ <r 
_L if j £ R and £ = r 

(d(j),£) otherwise 

00 if j £ R and £ = r 
Cj otherwise 

while rates, existence intervals and reset actions are the same as for the corresponding states and actions 
of G. Plays and value functions of G' will be denoted by p' and v', respectively. We will show that for all 
(k,x) £ S x [0,M], v[ k0) (x) = v k (x). 

Every strategy profile n' for G' can be interpreted as a history-dependent strategy profile for G in the 
following way: For every play that can be achieved by moving according to it' make the corresponding choice 
in 7r', for other plays make arbitrary choices. Also, every positional strategy profile tt for G can be interpreted 
as a strategy profile for G' by using the same choices regardless of the number of encountered reset actions. 
With these interpretations we see that IL. (G) C 11* (C) C T t (G). 

For all configurations (k,x) £ S x [0, M], if p'7k,o),x uses at mos t r reset actions, then cost(p / / fc ) x ) — 
cost(pjl' x ), since the actions encountered in the two games have the same costs. If p'Jk,o),x uses more than 
r reset actions, then cost(p'J k ^ x ) — 00 > cost(p£ ;X ). Hence, we always have v'J k ^(x) > v k (x). Using 



d'(j,£) = 
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Thcorcm l4.1l it follows that: 

v [k,o)( x ) = inf SU P v'J^Hx) > inf sup ^'^(x) 
nen; 7r 2 en 2 ' T i£ n i 7r 2 en 2 

= inf sup v k 1,7r2 (x) — Vk(x) 

The first inequality follows from the costs being larger in G", and the next equality follows Theorem 14. 11 the 
same values can be obtained using only positional strategies in G. 

Next we show that v'q. q \(x) < Ufc(x), implying that v'^ k \(x) = Vk(x). This is clearly true if Vk(x) = oo, 
thus, we may assume that Vk{x) < oo. In particular, e-optimal strategies do not generate plays with more 
than r reset actions in neither G nor G 1 . We see that: 

Vk{%) = hif sup vZ 1 ' m (x) = inf sup i^ 1 '*^^) 
Tieni T26 n 2 Tien; ^ G n 2 

= inf sup v'J^l 2 {x) = v[ k0) (x) 
Tr' 2 en' 2 

For the second equality we use Theorem 14.11 the values do not change even if certain history-dependent 
strategies are available. For the third equality we use the assumption that the values are finite. This implies 
that for relevant strategy profiles the values of the two games are the same. 

We now know that in order to find the value Vk(x) in G it suffices to find v', k \(x) in G". To do this 
we exploit the special structure of G' . We observe that states (k,£) G S x {0,...,r} do not depend on 
states (k,£') with £' < I. Thus, the game can be solved using backward induction on i. In particular, when 
v '(k t+i)( x ) ^ S known for all k and x, then the subgame consisting of states (k,£), for k G 5*, can be viewed 
as an independent PTG with no reset actions. I.e., reset actions lead to states with known values, and can, 
thus, be thought of as going directly to the terminal state with an appropriate cost. Each subgame has n 
states and m actions, and there are r + 1 such subgames. 

□ 

Now we just need to show how to solve PTGs without resets using SPTGs. 

We will show the statement using 3 reductions. First we will reduce PTGs without resets to the subclass 
of such games, where, for each action j G A, we have Ij G {(0, 1), [1, 1]}. Afterwards we will reduce further 
to the subclass of PTGs where for each action j G A, we have Ij G {[0, 1], [1, 1]}. At the end we will reduce 
those to SPTGs. 

Let X be the set which consists of and the endpoints of existence intervals of G. Let the i'th largest 
element in X be Mj. Note that M% = M. 

We will now define some functions on PTGs. For a PTG G = (Si, 5*2, (Ak)kes, c, d, (rk)k£S, (Ij)jeA, R), 
where R = 0, a x G K and a vector v G (R>ou{oo})"i let the priced game G v ' x = (Si, S2, (A' k )k^s, c' , d!) be 
defined by: 



Vfc G S :A' k = {j G Ak\x G Ij} U {_L fc } 

otherwise 

if j =^k 
otherwise 




The game G v ' x is similar to the priced game defined in Definition 13.41 The intuition is that G v,x can 
model a specific moment in time. 

Definition 4.3 For a given PTG G — (Si, S*2, (Ak)keS, c,d, (rk)kes,(Ij)jeA, R), a x £ H and a vector 
v G (M> ou{oo} )", let the SPTG G v > x > d = (S U S' 2 , (A' fc ) fce(SlUS ,), c', d', (r£) fc6(SlUSi) ) be defined by: 
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Function SolvePTG(G) 



v(Mi) <- ExtendedDijkstra((S*i,5 2 ,({j G A k \M x G Ij})keS, c, d)); 
i «- 1; 

while Mi > do 

i «- i + 1; 

2 ' 

u' <- ExtendedDijkstra(G , '( Ml - 1 )' :E ); 

<- SolveSPTG(G u '' :r ' M '- 1 - A/l ); 
forall a; G (M,-,M,_i) do 

_ u(Mi) <- ExtendedDijkstra(G t, *(°)' M 0; 
return u; 



Figure 8: Algorithm for solving PTGs without reset actions. 
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5 2 U {max} 
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if k = max 
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c, otherwise 



Vj G Al : rf' = 



■3 

max 

_L 

_L 
(I-, 



r k ■ d 

max{r fc } 

fees 



if k G Si and either j =_Lfc 
or dj =_L 

?/ fc G 5*2 and either j =_Lfc 
or dj =_L 
i/ fc = max 
otherwise 



The game G v,x,d is constructed from the proof of Lemma l4~5l Lemma FO)l and Lemma |4"771 The intuition 
is that the game can model an arbitrary length interval, in the original game, where no action changes status 
between available and unavailable. 

Theorem 4.4 The algorithm in Figure^ correctly solves Priced Timed Games without reset actions. 

The proof of correctness is that the algorithm is a formalization of the reductions in Lemma 14.51 Lemma 
14.61 and Lemma T4. 71 Note that instead of , any arbitrary point inside (Mj,Mj_i) would work. 

Let PTG™'™ be the subclass of PTGs, consisting of n states, m actions, none of which are reset actions, 
and where the existence interval for each action, j, is either Ij = I or Ij = [1,1]. In the latter case dj =J_. 
Note that for such games we can WLOG assume that m < 2n 2 , because for all actions with the same 
existence interval, only the one with the best cost will be used. 
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Lemma 4.5 Any game G in (n,m,0,d)-PTG can be solved in time 0((n\ogn + mm(m,n 2 ))d) using at 
most d calls to an oracle, R, that solves PTG"g™ + ™. 

We sketch the proof. It is easy to find the value of fc G S at time M\ in a priced timed game without 
reset actions, because no player can wait and hence the game is equivalent to a priced game. Between time 
Mi and M\ the game is nearly an SPTG, since we can simply translate by decreasing all times with M 2 and 
divide the times by M\ — M 2 to get a game between and 1 instead. After finding the value between M 2 
and Mi we can then find the value at M 2 , since we know the cost if we wait (it becomes lim x ^ M + v(k,x)), 
by viewing the game as a priced game at that point. We can then find the value between M 3 and M 2 , then 
at time M 3 and so on, until we have solved the game. 

Proof: We can find v(k, Mi) as the value of state k in the priced game which consists of the same states 
as G and the actions available at time Mi. We can do so, because the game contains no reset actions and 
we can therefore neither increase nor decrease time. Note that if multiple actions, j, in Ak and dj = £ exists 
for k, I G S, we can ignore all but the one with the best cost for the controller of k. Hence we can solve such 
a priced game in time 0(nlogn + min(m, n 2 )). 

We now want to find Vt £ 5,i £ (M2,Mi) : v(k,x). We see that if we wait until Mi in some state, k, 
the rest of the path to _L costs v(k, Mi), if we play optimally from Mi. We see that if we start at a time 
x, we can not reach a time before x, because there are no reset actions. Hence, look at a modified game 
G', with value function v': G' consists of the same set of states as G, but it only has the actions available 
in the interval (M 2 ,Mi), which, in G' , only exists in that interval, and for each state, k, an action to _L of 
cost v(k, Mi) which is only available at time Mi. We will also modify G' such that we subtract M 2 from 
all points in time. Clearly that will not matter for plays starting after time M 2 . Note that all intervals for 
actions are either (0, Mi — M 2 ) or [Mi — M 2 , Mi — M 2 ]. We can also divide all points in time with Mi — M 2 , 
by also multiplying the rate of each state with Mi — M 2 . Hence all existence intervals either have the form 
(0, 1) or [1, 1] and we clearly have that 

T — Mo 

Vx G (M 2 ,Mi), k e S : v(k,x) = v'(k, M ^ )■ 
We can solve G' using a call to R. 

We will now find u(fc,M 2 ). If it is optimal to wait at time M 2 in state k, we have that v(fc,M 2 ) = 
lim.,.^o+ v'(k, x) = v'(k, 0), because we might as well wait as little as possible and then play optimally from 
there. Hence, z;(fc,M 2 ) is the value of state k in the priced game G", consisting of the same states as G 
and the same actions as those available at time M 2 in G and for each state k, a action from k to _L of cost 
v'(k,0). Like we did for Mi we can ignore all but one action from a state to another. Hence we can solve 
such a priced game in time 0(n log n + min(m, n 2 )). 

We now want to find VS; 6 S,x £ (M 3 ,M 2 ) : v(k,x). We can therefore do like we did for Vfc G 5, x G 
(M 2 , Mi) : v(fc, x). Also to find v{k 1 M 3 ) we can do like we did for v(k, d 2 ). We keep on doing this until we 
are done. 

Hence, we use d calls to R and solve d + 1 priced games. □ 

We will now to reduce a game in PTG™^ , with value function, u, to a game in PTG 1 ^"^ using 0(n log n+ 
min(m, n 2 )) time. First note that it is easy to find v(k, 1) using a priced game, because time can not change 
at time 1. It is clear that v(fc, 0) = lim a ,_ 5 .Q+ v{k, x), because the only option at time is to wait. Hence we 
only need to look at finding v{k 1 x) for x G (0, 1). To that we will use the following lemma. Note that the 
game G' mentioned in the lemma is in PTG|q™. 

Lemma 4.6 Let G be a PTG™q™ , with value function v. Let G' be the modified version of G, where all 
existence intervals of the form (0, 1) in G instead have the form [0, 1]. Let v' be the value function for G' . 
We then have: Vfc G 5, x G (0,1) : v{k 1 x) — v'(k,x). 

Proof: Let e > 0. We will show that Vfc G 5, x G (0, 1) : v{k 1 x) = v'(k,x) by constructing a strategy, cri, 
for player 1 that guarantees at most v'(k,x) + e, in G, for any k G S and for any x G (0, 1). Similarly we 
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will construct a strategy, 02, for player 2 that guarantees at least v'(k,x) — e, in G, for any k G S and for 
any x G [0, 1). 

Let a[ be a e/2-optimal strategy for player 1 in G' . Let r max = maxj-gj r(s). Let af be the optimal 
strategy in the priced game which consists of the same states as G, but only those actions available at time 
1. Let a 1 be the optimal strategy in the priced game which consists of the same states as G' . It is clear 
that if the existence interval of af (k) in G is [1, 1] then af (k) = erf (k). 

We will now construct o\ . 



ai(k,x) 



A 

a[(k,x) 
af{k) 



if x = 

if < x < 1 - ^— 

z ' max 

if 1 — — < x < 1 and the existence 

interval for <7f (k) in G is (0, 1) 

if 1 — 7^ — < x < 1 and the existence 

interval for erf (k) in G is [1, 1] 

if x = 1 



Let fc G S*, x G (0, 1). We will first show that v'(k,x) > v'(k, 1). If k G S2 player 2 could simply wait 
until time 1, and since r(k) > the statement follows. If k € Si player 1 must keep the play away from 
S2 (because that would reduce it to the first case) and have no advantages in waiting, since no new actions 
become available. But since all actions are available at time 1, player 1 could follow the same strategy, as 
he uses at time x, and get the same cost. 

We will now show that o\ guarantees at most v'(k,x) + e, for k G S, x G (0,1) in G. We will do 
so by contradiction. Assume not. Hence there is a strategy cr 2 , ax G (0,1) and a k G S, such that 
v^ 1,(J ' 2 \x) > v'{k,x) + e. Let p be the play defined by (cri,^)- 
There are now two cases. Either x > 1 — ttzt — or not. 
If x > 1 — 75— ^ — , we know that 

4> max 



2r„ 



,( <7 l>°'2)/_ \ _ 



(x) = cost(p) = ^(<W, + c h ) 



We have that J2t=o ^* ^ s at most 1 ~ x j because there are no reset actions, r kl < r max , by definition, and 
Si=o c ii — 1): by construction of cti. 
Hence 



v£ ua >\x) <(!-(! 



-)) r max + «'(fc, 1) 



= e/2 + w'(fc, 1) < e/2 + v'(k,x) 

That is a contradiction. 

Otherwise, if x < 1 — 2— - — , there are two cases. Either the play defined by (<Ji,<T2) a t some point waits 
until time x' > 1 — ^ — or not. If not, then the play cost at least u'(fc, x) + e/2 because player 1 has at all 
times followed a strategy that guarantees at least that. 

Otherwise, we can divide p up in two. p 1 is the first part. The second part, p 2 begins in some state k! 
and at time x' such that x' = 1 — ^ — . Note that this might be in the middle of a wait period. Clearly 
cost(p) = cost(/? 1 ) + cost(p 2 ). We must have that cost(p 1 ) + v'(k',x') < v'(k,x) + e/2, because we followed 
a e optimal strategy for player 1 in G' in p 1 . By the first part we also know that cost(p 2 ) < v'(k', x') + e/2. 

Hence it is easy to see that cost(p) < v'(k, x) + e. That is a contraction. 

The construction of cr 2 can be done symmetrically. 

□ 



23 



Lemma 4.7 Solving any game G in G' 6 PTG" : ™ can be polynomially reduced to solving an SPTG with 
n + 1 states and m + 1 actions. 

Proof: Player 2 will never use a action, j, to ± except at time 1 in a simple priced timed game, because 
player 2 might as well wait until time 1 before using j, which will not decrease the cost because rates are non- 
negative. Hence we can change all actions, j, of the form [1, 1] to [0, 1] if j G A k , k G S2, without changing 
the value functions. We will create a new state, max, with the maximum rate in the game, belonging to 
player 2, which has a action to _L of cost and existence interval [0,1]. We will now redirect all actions 
which have existence interval [1, 1] to max and change the existence interval to [0, 1]. We can see that player 
1 will only use the actions to max at time 1, since it is cheaper to wait to time 1 and then move to max. 

Now all existence intervals have the form [0,1]. It is easy to see that we only need one action, j, for 
j G Ak and dj — I for any pair k,£ G S, because the controller of k will, when playing optimally, only use 
the action with the best, for that player, cost. Hence the game is a simple priced timed game. 

□ 

Lemma 4.8 Any game G in (n,m,r,d)-PTG, can be solved in time 0((r+ l)d(n log n + min(m,n 2 ))) using 
at most (r + l)d calls to an oracle R that solves SPTGs with n + 1 states and at most m + n + 1 actions. 

Proof: The proof is a simple consequence of Lemma 14.21 Lemma 14.51 Lemma 14.61 and Lemma 14.71 □ 
Note that d is bounded by 2m + 1 and r is bounded by n. 

Theorem 4.9 Any game G in (n,m 1 r,d)-PTG, can be solved in time 

0((r + l)d(mm(m, n 2 ) + n • min{12", [A k + 1)})). 

fees 

Proof: The proof is a consequence of Theorem 13.111 and Lemma l4~8l Note that we only get Ilfees(^' £ + 1) 
and not Ilfces(^fe + 2), because the additional actions we add to each state (using Definition 14.31 and I3.4|) 
both goes to _L and hence we only need one of them. □ 

Theorem 4.10 The BCFL-ABM algorithm solves any PTG G using at most 

m • n° (1) min{12™, [| (A k + 1)}) 

fees 

iterations. 

Proof: Note that Lemma 14.91 gives us a upper bound on the number of line segments of the value functions 
of G, because the number of line segments is a lower bound of the size of the output. By Bouyer et al. [S], 
page 11, we know that the number of iterations needed for the BCFL-ABM algorithm is at most the number 
of line segments times n. □ 

Theorem 4.11 Any priced timed game, G, in (n,m,r,d)-PTG, where all states have rate 1 and all actions 
have cost 0, can be solved in time 0{{r + l)<i(nlogn + min(m, n 2 ))). 

Proof: If we use Lemma [4.21 Lemma [4.51 Lemma [4.61 and Lemma [4.71 on such a game, we get (r + l)d 
SPTGs. If we look carefully at the lemmas we see that all states, in the SPTGs have rate c, for some c > 0. 
c depends on the interval length. Also, all actions that do not go to _L or max have cost 0. 

We now need to bound the number of event points. We will show that L(G') = 1 for G' being any of the 
SPTGs generated. 

Look at the priced game G , as defined in section |3~T1 Let a be some optimal strategy profile for G 1 . We 
see that, if r(k, a) = for some k G S, we can not have passed through any states in S2, since it is optimal 
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to wait until time 1 for player 2. Since all actions of positive cost either goes to a state in S2 or from a state 
in S%, we must have that v£ = 0. 

For convenience we repeat the definition of the next event point and the function / here. The definition 
of the next event point, NextEventPoint(G :c ), was: 

max {0} U {x G [0,x) | 3k G S,j G A k : fj, x (x) ^ U{k),x{x) A fj.x(x) = U(k),x(x')}- 
The definition of / was 

fjAx") = Ci + a x (d(j)) + b*(d(j))(x - x"). 

Note that b x (d(j)) corresponds to the rate of the next state we wait in if both players follow a and 
fj,x{x") is the cost to reach _L if both players follow a. Hence fj X > and if b 1 = then fj tX ( x ") — by the 
preceding, because a was optimal. Note that if ^ /rr(fc),i(l); then at least the larger expression of the 

two must have = c and therefore we have that for all x G [0, 1) : fj.i{x) ^ f-n(k),i{x)> because either 

the 6 (d(j))'s are equal in the two expressions, in which case the difference between the two expressions do 
not change with x, or one is positive and the other is for all x G [0, 1]. 

Therefore we can apply Theorem 13.111 and get that SolveSPTG solves any of (r + l)d SPTGs in time 
0(m + nlogn). We therefore solve all in time 0((r+ l)d(min(m, n 2 ) + nlogn). The reductions also required 
time 0((r + l)(n \ogn + min(m, n 2 ))d). Hence, our time bound becomes 0((r + l)d(nlogn + min(m, n 2 ))) . 
□ 

Theorem 4.12 Any priced timed automata (i.e., all states are controlled by Player 1), G in (71,711^^)- 
PTG, can be solved in time 0((r + l)dn 2 (min(m, n 2 ) + nlogn)). 

Proof: If we use Lemma [4.21 Lemma [4.51 and Lemma [4.61 on a priced timed automata, we get (r + l)d 
priced timed games, without resets where all existence intervals are either [0, 1] or [1, 1] and all states belong 
to Player 1. The algorithm described in Figure [5j solves SPTGs by first solving them for time 1, as a priced 
game, and then solve them by induction backwards through time. We can still solve the game at time 1, 
and there is no differences in the induction, since no actions become available at time x for x < 1. Hence, 
from Theorem 13 . 1 01 and Theorem 13 . 1 1 1 we get that we can solve such games in time n 2 (min(m, n 2 ) +nlogn). 
The reductions also required time 0((r + l)c?(min(m, n 2 ) + nlogn)) and therefore the total time complexity 
is 0((r + l)dn 2 (mm(m, n 2 ) + nlogn)). □ 

5 Concluding remarks 

We have presented an algorithm for solving one clock priced timed games with a complexity which is close 
to linear in L, with L = L(G) being a lower bound on the size of the object to be produced as output. We 
think it is an attractive candidate for implementation. 

We have also given a new upper bound on L. While it is better than previous bounds, we do not expect 
this bound to be optimal. It seems to be a "folklore theorem" that L does not become very big for games 
arising in practice. We would like to suggest the following conjecture. 

Conjecture 5.1 For all SPTGs G, L(G) < p(n) for some polynomial p. 

Note that if this conjecture is established, it implies that our algorithm as well as the BCFL-ABM algorithm 
runs in time polynomial in the size of its input. 
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